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Abstract. In this paper we analyze the blind deconvolution of an image and an unknown blur in a coded imaging 
system. The measurements consist of subsampled convolution of an unknown blurring kernel with 
multiple random binary modulations (coded masks) of the image. To perform the deconvolution, we 
consider a standard lifting of the image and the blurring kernel that transforms the measurements 
into a set of linear equations of the matrix formed by their outer product. Any rank-one solution to 
this system of equation provides a valid pair of an image and a blur. 

We first express the necessary and sufficient conditions for the uniqueness of a rank-one solution 
under some additional assumptions (uniform subsampling and no limit on the number of coded 
masks). These conditions are special case of a previously established result regarding identifiability 
in the matrix completion problem. We also characterize a low-dimensional subspace model for 
the blur kernel that is sufficient to guarantee identifiability, including the interesting instance of 
“bandpass” blur kernels. 

Next, assuming the bandpass model for the blur kernel, we show that the image and the blur kernel 
can be found using nuclear norm minimization. Our main results show that recovery is achieved 
(with high probability) when the number of masks is on the order of /x log 2 L log ^ log log (N + 1) 
where /x is the coherence of the blur, L is the dimension of the image, and N is the number of 
measured samples per mask. 

Key words, blind deconvolution, coded mask imaging, lifting, nuclear norm minimization 

1. Introduction. The blind deconvolution problem has been encountered in many fields 
including astronomical, microscopic, and medical imaging, computational photography, and 
wireless communications. Many blind deconvolution techniques, that are mostly tailored for 
particular applications, have been proposed in these communities. These techniques can be 
divided into two categories based on their general formulation of the problem. The methods 
of the first category typically reduce the blind deconvolution problem to a regularized least 
squares problem without imposing stochastic models on either of the convolved signals. High 
computational cost and sensitivity to noise are the main challenges for these methods. The 
second category of blind deconvolution methods follow a Bayesian approach and consider 
prior distributions for either or both of the signals. An extensive review of the classic blind 
deconvolution methods in imaging can be found in [4]. A survey of the multichannel blind 
deconvolution methods used in communications can be found in [24] as well. 

1.1. Contributions. In recent years, various imaging architectures have been proposed 
that are based on randomly coded apertures. For example, in [18], rapid switching of expo¬ 
sure of cameras in random patterns is leveraged for motion deblurring. Furthermore, in the 
“single-pixel-camera” [8] and compressive hyperspectral imaging systems [22, 13, 15, 17], ran¬ 
domly coded masks are utilized to radically reduce the cost associated with spatial or spectral 
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sampling in traditional imaging systems. In this paper, we consider the blind deconvolution 
problem in a similar imaging architecture that relies on randomly coded masks. As illustrated 
in Figure 1, the considered imaging system first generates different modulated copies of the 
target image, each of which is then blurred by a fixed unknown filter (i.e., the blurring ker¬ 
nel), and finally subsampled by a relatively small number of sensors. Throughout the paper, 
we only consider a uniform subsampling operator in our model. This idealized subsampling 
operator corresponds to an array of single pixel sensors with uniform spacing. However, a 
broader class of subsampling schemes can be treated in a similar fashion. For example, a 
more realistic model for a single sensor is a weighted integrator of a few neighboring pixels. 
Spatially uniform subsampling with an array of these types of sensors can be reduced to the 
pointwise subsampling by absorbing the weight window into the blur filter. We believe that 
our analysis can be extended to even broader class of subsampling schemes, but we do not 
pursue these extensions in this paper. 

As will be discussed in Section 3, in the absence of any model for the blurring kernel 
the subsampling operation renders the unique recovery of the image impossible, regardless of 
the number of measurements acquired. Intuitively, there are two competing requirements for 
the blurring kernel in the considered imaging system. First, it is necessary that the image is 
blurred and spread to the extent that the subsampling sensors do not miss any pixel of the 
image. As will be seen, this requirement can be satisfied by considering a subspace model 
for the blurring kernel. Second, the blurring should not be excessive, otherwise the sampling 
becomes inefficient as different sensors collect (nearly) identical measurements. As will be 
seen in Section 3.2, the severity of the blur can be quantified by the coherence of the blurring 
kernel. Specifically, our results show that the coherence critically affects the sufficient number 
of masks for successful reconstruction through convex optimization. 

We first study identifiability of the problem without restricting the number of measure¬ 
ments. In Section 3.1, using the results of [11] for matrix completion we express a necessary 
and sufficient condition for identifiability which has a combinatorial nature. Often, optical 
models can provide us with some crude approximations of the blurring kernel that can be 
leveraged as prior information. For example, we can conveniently incorporate these prior 
information in a subspace model where the linear span of the available approximations is 
considered as the set of feasible blurring kernels. Our second and more concrete identifiability 
result is obtained by considering a low-dimensional subspace model for the blurring kernel. 
We show that the described blind deconvolution problem is identifiable, if the mentioned low¬ 
dimensional subspace obeys certain conditions. In particular, our results show that if the 
blurring kernel has a sufficiently narrow “bandwidth” then the desired condition holds and 
thus we can uniquely identify the image and the blurring kernel. 

In the second part of our work, we show that, under a “bandpass” blur model, we can 
perform the blind deconvolution through lifting and nuclear norm minimization. This system¬ 
atic approach applies not only to our blind deconvolution problem, but also to a variety of 
other bilinear inverse problems that involve unknown linear operators. The theoretical guar¬ 
antees, which are explained in Section 3.2, rely on construction of a dual certificate for the 
nuclear norm minimization problem via the golfing scheme [9]. Furthermore, the concentra¬ 
tion inequalities recently developed in the field of random matrix theory are frequently used 
throughout the derivations. Finally, while we state our results under the bandpass modelling 
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of the filter, with some effort similar results can be established for the more general subspaces 
described by the identifiability sufficient conditions. 

1.2. Related work. In [5], the PhaseLift method [7] is extended to address the phase 
retrieval problem in coded diffraction imaging, where the measurements have a more intricate 
structure. It is shown that the trace minimization (i.e., PhaseLift) can solve the phase retrieval 
problem, if the randomly coded masks follow certain “admissible” distributions and their 
number is poly-logarithmic in the ambient dimension. The use of coded masks in the phase 
retrieval problem of [5] is effectively similar to that in the blind deconvolution problem we 
address in this paper. However, the measurement model in this paper is different from that 
of [5]. 

In [1], a convex programming technique is proposed for blind deconvolution, where by 
lifting the signal and the filter to their outer product, the problem is cast as reconstruction 
of a rank-one matrix from a set of linear measurements. It is shown in [1] that the nuclear 
norm minimization can robustly and accurately recover the rank-one solution to the convolu¬ 
tion equations. This blind deconvolution technique imposes certain low-dimensional subspace 
structures on the input and channel to reach a well-posed problem. 

More recently, [23] has examined the problem of blind deconvolution in an imaging system 
similar to what considered in this paper. It is shown in [23] that one can recover the image and 
the blurring kernel through lifting and nuclear norm minimization, provided that the number 
of applied masks is greater that the coherence of the blurring kernel by a poly-logarithmic 
factor of the image length. The fact that we consider the effect of subsampling makes the 
imaging model considered in this paper more general than that of [23]. In the special case 
that subsampling is not applied, our problem reduces to that of [23]. In this regime, the 
sufficient number of masks obtained here and in [23] have similar growth order. Only for 
blurring kernels that have more uniform spectrum (i.e., have low coherence), our bounds can 
be slightly worse up to a double logarithmic factor of the image length. We believe that our 
bounds can be further improved using sharper inequalities throughout the derivations, but we 
do not attempt to find the optimal logarithmic factors. 

1.3. Notation. Throughout this paper we use the following notation. Matrices and vec¬ 
tors are denoted by bold capital and small letters, respectively. A superscipt asterisk denotes 
the Hermitian transpose of matrices and vectors (e.g., X*, a?*). More generally, the same 
symbol denotes the adjoint of linear operators (e.g., *4*). Restriction of a matrix X to the 
columns enumerated by an index set J is denoted by Xj. For a vector x, we write x\j to 
denote its restriction to the entries indicated by the index set J. Real and imaginary parts 
of complex variables are denoted by preceding symbols 5ft and S, respectively. Scalar func¬ 
tions applied to vectors or matrices act entrywise. Nullspace and range of linear operators 
are denoted by null (•) and range (•), respectively. Hadamard product (i.e., entrywise product) 
operation on two matrices or vectors is denoted by © symbol. Entrywise conjugate of a matrix 
(or vector) is denoted by putting a bar above the variable (e.g., X is the entrywise conjugate 
of X). We frequently use the normalized Discrete Fourier Transform (DFT) matrix which is 
denoted by F whose size should be clear from the context. Furthermore, F n is used to denote 
for the restriction of F to its first n rows. Moreover, the Z-th column of F * is denoted by 
The DFT of a vector is denoted by the same name with a hat sign atop (e.g., for x E C L , 
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Figure 1: Schematic of the masked imaging system. The reflection of the target image from a 
DMD with a random pattern is blurred by a lens and then subsampled by a few sensors. The 
result is one set of measurements that correspond to the chosen DMD pattern. 


the DFT of x is denoted by x — \^LFx). The diagonal matrix whose diagonal entries form a 
vector x is denoted by D x . Furthermore, the matrix of diagonal entries of a square matrix 
X is denoted by diag ( X ). The vector norms ||-|| for p > 1 are the standard t^-norms. The 
spectral norm, the Frobenius norm, and the nuclear norm are denoted by ||-||, ||-|| F , and ||-H^, 

P P 

respectively. We find it convenient to use the expression / > g (or / < g) as a shorthand for 
inequalities of the form / > cpg (or cpf < g), where cp > 0 is some absolute constant that 
depends only a parameter (3. We drop the superscript in this notation whenever the constant 
factors do not depend on any parameter. 

2. Problem setup. We consider a blind deconvolution problem in an imaging system 
depicted by Figure 1 that involves subsampling. To simplify the exposition, through out the 
paper we consider ID blind deconvolution, but generalization to 2D models is straightforward. 
In our model, multiple binary masks <p k (k — 1, 2,..., K) are applied to an image represented 
by x G one at a time by the means of a Digital Micromirror Device (DMD). The masked 
reflection of the image from the DMD is then blurred through a secondary lens represented 
by a filter h G M L . We model the action of the filter on its input by a circular convolution. 
The masked and blurred image is then subsampled using N < L sensors. Mathematically, the 
described system can be represented by the equations 

m k = GHD<p k x, 

where H is a circulant matrix whose first column is h which models the blurring lens, G is an 
NxL matrix representing the (linear) subsampling operator, and m k is the k- th measurement, 
corresponding to the k- th mask (j) k . In this paper, we exclusively consider uniform pointwise 
subsampling as our subsampling operator G. Note that the above equation for all of the 
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measurements can be written compactly as 

(2.1) M = GHD x <f>, 

where = [ (p 1 <fi 2 • • • 4>k ] is the matrix of masks and AT = [ mi m 2 • • • ttik ] is the 
matrix of measurements. 

Accurate estimates of the blurring kernel h might not be available in practice. For exam¬ 
ple, random vibrations of the lens and the subject relative to each other, turbulence of the 
propagation medium, or errors in measuring the focal length of the lens can preclude accurate 
estimation of the blur. Therefore, it is highly desirable to perform a blind deconvolution for 
reconstruction of both the image and the blurring kernel from the measurements of the form 

(2.1) up to the global scaling ambiguity. 

Because G is in general a wide matrix, recovery of h and x (up to a scaling factor) can 
be ill-posed even with an unlimited number of masks. Therefore, it is worthwhile to study 
the identifiability of our inverse problem under the assumption $ = I. In Section 3.1, we 
elaborate on the conditions under which we can guarantee identifiability. 

In Section 3.2 we introduce a convex program as a systematic method for the blind decon¬ 
volution. To analyze this method, we assume that the blurring kernel follows a “bandpass” 
model that was suggested by the sufficient identifiability conditions. In particular, in Section 
3.2 we assume that h = -^Fjsfh is the blurring kernel for some TV-dimensional vector h. 
Furthermore, to have a realistic model of the system, we assume that the number of masks 
is limited and should be relatively small. While ideal binary masks are {0, l}-valued, for 
technical reasons we consider the elements of to be iid Rademacher random variables that 
take values in {±1} with equal probability. Note that this assumption is not unrealistic as the 
{0,1}-valued masks can be converted to {±l}-valued masks by using an extra all-one mask. 

3. Main results. 

3.1. Identifiability without measurement limitations. In this section we analyze the iden¬ 
tifiability of the image and blurring kernel when arbitrarily large number of measurements are 
available. Therefore, we can assume that <l> is full-rank and has at least as many columns as 
rows. This assumption implies that we can reduce our observation model to 

(3.1) M = GHD Xl 

which is equivalent to (2.1) for $ = I. As mentioned in Section 2, the subsampling matrix 
G is assumed to model a uniform pointwise subsampling. Therefore, each row of G is zero 
except at one entry where it is one. This implies that each entry of the matrix of observations, 
AT, can be expressed as X{hj for certain indices i and j. It is necessary to assume that the 
columns of GH are all non-zero to ensure the information of every pixel of the image is 
retained. Moreover, the observations can also be written as 

(3.2) vec (M) = [xiGj x 2 Gj ... x L Gj] T h, 

where vec (AT) is the columnwise vectorization of AT, G\ = G, and for i > 1, Gi is obtained 
by circularly shifting the columns of Gi -1 to the left. If any of the columns of the matrix on 
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X H 



Figure 2: The bipartite graph constructed based on the observations, 
are connected whose product is observed and non-zero. 


Only pairs of vertices 


the right-hand side of (3.2) is zero, the corresponding entry of h cannot be recovered from the 
observations. Therefore, it is necessary to assume that the columns of the mentioned matrix 
are all non-zero. For the special choice of G that we consider, these two assumptions imply 
that the measurements Xihj for any particular i and similarly for any particular j cannot be 
simultaneously zero. 

The necessary and sufficient condition for identifiability of our blind deconvolution problem 
is a simple special case of the combinatorial identifiability conditions presented in [11] for the 
well-known low-rank matrix completion problem. For completeness, we state the identifiability 
condition in Lemma 1 whose proof is subsumed in the appendix. Let Q — (X, 1-L, £) be an undi¬ 
rected bipartite graph. The vertex partitions X = {aq, ..., xjf) and T~L — {hi, h^, •••, Hl} 
correspond to the entries of x and h , respectively. Furthermore, Q is constructed such that 
{xi,hj} E £ iff the value • hj is observed and is non-zero. An example of such graphs is 
shown in Figure 2. 

Lemma 1 .The rank-one matrix hx T is uniquely recoverable from its subsampled entries iff 
the corresponding bipartite graph has only one connected component of order greater than one . 

Suppose that G models a uniform subsampling with period T < L in (3.1). Then, as 
illustrated in Figure 3, the measurements in (3.1) are identical to the skew diagonal entries 
of the rank-one matrix hx T that are T entries apart in each row (or column). Therefore, 
our deconvolution problem is basically a special rank-one matrix completion problem where 
Lemma 1 applies. As an illustrative example, consider the case that neither x nor h have 
zero entries. The graph associated with the measurements (3.1) is then an TV-regular bipartite 
graph where N — + 1 is the number of sampling sensors (i.e., the number the rows of 

G). If we also have gcd (T,L) = 1, then it is straightforward to verify that the constructed 
graph is connected and by Lemma 1 the matrix hx T can be recovered uniquely. 

Although Lemma 1 establishes the necessary and sufficient condition for identifiability 
of our problem, it is desirable to have alternative guarantees that are not combinatorial in 
nature. The following theorem provides a sufficient condition for identifiability by imposing 
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Figure 3: Measurements given by (3.1) 


i + j = 2 mod 10} 
i + j = 6 mod 10} 
i + j = 0 mod 10} 
as the entries of hx T for L 


10 and T = 4. 


a subspace structure for the blurring kernel. 

Theorem 1 .For N = + 1 let V G C LxN be a given matrix whose restriction to rows 

indexed by 

J i := {j | 1 < j < L and j = — i + 2 + kT mod L for some 0 < k < N — 1} , 

is full-rank for all i = 1, 2,..., L. For any image x 7^ 0 and any blurring kernel h G range (V), 
the rank-one matrix hx T can be uniquely recovered as the solution to the blind deconvolution 
problem (3.1). 

As can be inspected in Figure 3, for each i = 1, 2,..., L the set described in Theorem 1 
determines the location of observed entries in the i-th column of the matrix hx T . Using this 
observation, the proof of Theorem 1, provided in Appendix 3.1, is straightforward. 

Corollary 1 .Let Fq denote a matrix of some N (circularly) consecutive rows of the nor¬ 
malized L-point DFT matrix that are indexed by fl. For any image x ^ 0 and blurring kernel 
h G range (Fq), we can recover the rank-one matrix hx T uniquely from the measurements 
given by (3.1). 

Proof Without loss of generality we assume that £1 = {1, 2,..., N } as the proof is similar 
for other valid choices of £1. The result follows immediately from Theorem 1 should the matrix 
V = F* n = F* N satisfies the requirements of the theorem. Namely, it suffices to show that the 
restriction of F* N to the rows indexed by J i is full-rank for all i = 1, 2,..., L. With uo e 27F1 / L 
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the restriction of F* N to the rows in can be written as 


i?* 

* Ji,N •" 


1 


1 co~ i+1 
1 u T ~ i+1 

l u (N-l)T-i+l 


CJj 


^(N-lXT-i+l) 


Having Fj i ^a = 0 for some a E is equivalent to having the polynomial a (z) := 
^2iLi a i zl ~ 1 vanishing at 2; = ..., which are N distinct points 

in C. This is possible only if a (z) = 0 because the degree of a (z) is less than N. Therefore, 
F*, a = 0 iff a — 0 as desired. ■ 

Ji,N 

While Corollary 1 shows that unique reconstruction of the image and the blurring kernel 
is possible for “bandpass” blurring kernels, it does not provide any robust recovery method. 
Interestingly, there is also a robust recovery method for the bandpass model as described 
below. As in the proof of the corollary, we consider the case of Q — {1, 2 ,..., N} to simplify 
the exposition. Note that the measurement can be written as 


M = GHD X + E = GF^ n D Ji F n D x + E, 


where 


h 7 with slight abuse of our notation, denotes the frequency content of the filter 
h (i.e., h = ^ 2 = F* N h), 


and E denotes the measurement error. Since G is assumed to be a uniform subsampling 
operator, the matrix G := GF ^ is invertible as shown in the proof of Corollary 1 . Therefore, 
we can write 


G AT — D<^F]\[D X + G E — Fjy © 




+ G l E. 


Let Fn be the entrywise conjugate of Fjy. Entrywise multiplication of both sides of the above 
equation by Fjy yields 

F n © = jhx* + F N Q (g -1 #) • 


Therefore, we can estimate hx* as the best rank-one approximation to the matrix 

2-1 . 


Fn © ( G E 


LFn © with estimation error being less than 2 L 

3.2. Blind deconvolution via nuclear norm minimization. In this section we consider a 
convex programming approach for solving ( 2 . 1 ) under the bandpass model for the blurring 
kernel described in 1 . Again for simplicity, we only consider the case that h = -j=^F* N h with 

h being the (truncated) DFT of h. 

Similar to the discussion following ( 1 ) we can rewrite the measurement equation ( 2 . 1 ) as 


M = GHD X ® = GF^D^FnD^. 
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Since G = GF* N is invertible (see proof of Corollary 1 above), it suffices to analyze recover¬ 
ability of h and x from observations 

M := v/f C _1 M = J±D- k F„D„* 


$ 


= )Ik\ FnQ V 1 * 

Define the linear operator A : C NxL —>• <C NxK as 


(3.3) 

whose adjoint is given by 


^l(X) := \I-{F N QX)&, 


A* (Y) = ^F N ®(Y&). 

We have M = A (ix*( Without loss of generality we assume that 

x, the target image, and h, the DFT of the blurring kernel, both have unit 
l 2 -norm. 

Furthermore, we define the coherence of the blurring kernel as 


(3.4) 


/' := 


h 


Ihl 


= L 


h 


We show that the nuclear norm minimization 


(3.5) 


arg min ||-X" H* 
subject to A(X) = M , 


can recover the matrix hx* with high probability. 

Theorem 2. Let $ E {±l} LxK be a random matrix with iid Rademacher entries and de- 

3 

fine the linear operator A as in (3.3). Then, for K > /ilog 2 L log ^ log log (N + 1) we can 

guarantee that (3.5) recovers hx * uniquely, with probability exceeding 1 — 0 ( NL ~^). 

Remark 1. Because h is assumed to have only N active frequency components, the coherence 
is bounded from below as 


^~N' 

Therefore, the bound that the theorem imposes on the number of masks can be simplified to 

3 

K > /a log 2 L log (N + 1) log log (N + 1). 
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Furthermore, the result of Theorem 2 suggests that K > log 2 L log (N + 1) log log (N + 1) 
random masks are necessary for 3.5 to successfully recover the target rank-one matrix. The 
dependence of this lower bound on L may seem unsatisfactory. However, if K < j^, for any 
fixed h the equation (2.1) will be underdetermined with respect to x, thereby x cannot be re¬ 
covered uniquely. Therefore, the number of measurements required by Theorem 2 is suboptimal 
only by some poly-logarithmic factors of L and N. 

Remark 2. To bring robustness to the proposed blind deconvolution approach, we can modify 


(3.5) by replacing the linear constraint with an inequality of the form A(X) — M 


< 5 , 


where M denotes the noisy observations and 5 is a constant that depends on the noise energy. 
Although accuracy of the described convex program can be analyzed as well, we do not attempt 
to derive these accuracy guarantees here and refer the interested readers to [6], [10], and [1] 
for similar derivations. 


4. Numerical experiments. For numerical evaluation of the blind deconvolution via (3.5) 
we conducted two simulations using synthetic data. To solve the nuclear norm minimization 
we used the solver proposed in [2, 3]. In the first experiment we used an astronomical image 
of dimension L — 128 x 128 as the test image. 1 To generate the blur kernel we generated an 
128 x 128 matrix of iid standard normal random variables and then suppressed its 2D DFT 
content outside a square of size N = 43 x 43 centered at the origin. 2 The subsampling of the 
blurred image is performed at the rate of | both vertically and horizontally which provides N 
scalar measurements per applied mask. We computed the subsampled convolution for K = 300 
random Rademacher masks which yields a total of K x N — 554700 scalar measurements. 
The relative error between the target rank-one matrix and the estimate obtained by (3.5) is 
in the order of 10 -7 . Figure 4 also illustrates that the proposed blind deconvolution method 
has successfully recovered the normalized image and the normalized blurring kernel up to the 
prescribed tolerance. 

In the second experiment we considered a more realistic model for the blur kernel. We 
use eight 64 x 64 consecutive slices of a 3D Point Spread Function (PSF) in the Born & Wolf 
optical model that is generated by the PSFGenerator package [12] to create a subspace model 
for the target PSF. We used the default model parameters set by the package, except for the 
size of the PSF array in the XY plane mentioned above. Figure 5 depicts an orthonormal 
basis of the subspace (in magnitude) that we used in the experiment. We chose one of the 
original PSFs as our target PSF. Furthermore, we use a 128 x 128 fluorescent microscopy 
image of endothelial cells as the target shown in Figure 7 (top left). For this experiment, the 
number of applied Rademacher masks is K — 200. The subsampling is uniform in vertical 
and horizontal directions at the rate of |. Therefore, the number of observations per mask 
is N — 16 x 16. To have a reference for comparison, the target image blurred by the target 
PSF and the 16 x 16 subsampled version of the blurred image are shown in the bottom row 
of Figure 7. 

Figure 6 illustrates the target PSF (left), the estimated PSF (center), and the error be- 


1 The image is adapted from NASA’s Hubble Ultra Deep Field image that can be found online at: 
http://commons.wikimedia.org/wiki/File:Hubble_ultra_deep_field_high_rez_editl.jpg 

2 The DFT indices are treated as integers modulo 128. 
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(a) Top row: The original 128x 128 HUDF image (left), reconstructed image (center), and the difference 
between normalized images (right), 

Bottom row: blurred image (left), 3X magnified subsampled blurred image (right) 


xIO " 11 



(b) Spectra of the original 128 x 128 blur kernel (left), the reconstructed blur kernel (center), and the 
difference between normalized blur kernels (right) 

Figure 4: Blind deconvolution of a Hubble Ultra-Deep Field image and a synthetic blur kernel 


tween the normalized target and the normalized estimated PSFs (right). Similarly, the top 
row of 7 illustrates the target image (left), the estimated image (center), and the error between 
the normalized target and the normalized estimated PSFs (right). As can be seen in these 
figures, the proposed blind deconvolution method has found accurate reconstructions of the 
PSF and the image. The relative error in the lifted domain is also in the order of 10 -6 . 
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Figure 5: The orthonormal basis for the PSF subspace shown in magnitude 


xIO ' 9 



Figure 6: The target PSF (left), the reconstructed PSF (center), and the difference between 
the normalized PSFs (right) 


5. Conclusions. In this paper we studied the blind deconvolution problem in an imaging 
system that captures multiple instances of the target image modulated by randomly coded 
masks, blurred, and then spatially subsampled. We first expressed the necessary and sufficient 
condition for identifiability of the problem using a graph representation of the unknowns and 
the measurements. Furthermore, we formulate a different sufficient identifiability condition 
by considering a subspace model for the blurring kernel. Finally, under a special subspace 
model, namely the bandpass model, we derived the sufficient number of random masks that 
allow successful blind deconvolution through lifting and nuclear norm minimization. 

An interesting extension to our work is to consider a subspace model for the image rather 
than the blurring kernel. We believe that this model would relax the identifiability require- 
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Figure 7: Top row: image of fluorescent endothelial cells (left), reconstructed image (center), 
and the difference between the normalized images (right), 

Bottom row: blurred image (left), 8X magnified subsampled blurred image (right) 


ments imposed on the blurring kernel that might not be appropriate in certain scenarios. The 
image subspace model can be further relaxed to sparsity with respect to some basis (e.g., 
wavelet, Fourier, etc). This subspace-sparse model would lead to estimation of simultaneously 
low-rank and row-sparse matrices in the lifted domain. These problems, in their general form, 
are known to be challenging and resistant to usual convex relaxation techniques [16]. How¬ 
ever, we believe that the particular measurement model in the blind deconvolution problem 
can enable us to perform accurate and efficient blind deconvolution. 

Appendix A. Proofs of the results in Section 3.1 . 

In this section we provide the proofs pertaining to the identifiability analysis provided in 
Section 3.1. 

Proof of Lemma 1. The assumptions made in Section 3.1 ensure that the isolated vertices 
in the considered bipartite graph correspond to the entries with value zero. Furthermore, for 
each edge of the graph we observe the product of its end nodes. Therefore, in each of the 
connected components of the graph, choosing the value of only one of the vertices is enough 
to uniquely determine the value of the other vertices of that component. This assignment 
of values does not depend on the other connected components of the graph. Therefore, if 
there are two or more connected components of order greater than one, each of them can take 
independent values. This implies that the corresponding matrix hx T cannot be recovered 
uniquely. ■ 
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Proof of Theorem 1. As mentioned in Section 3.1, we have restricted our problem to sce¬ 
narios where the measurements Xihj for any particular i and similarly for any particular j 
cannot be simultaneously zero. Therefore, the zero entries of x (and also h ) can be easily 
identified from the zero measurements. By the assumption that x ^ 0, there is at least one 
1 < z < L where X{ / 0 . Let h |j. denote the restriction of h to the entries indexed by J^. 
Therefore, we observe the vector X{ h |j which is nonzero. Invoking the assumption that the 
restriction of V to the rows indexed by J i is full-rank we deduce that we can recover the vector 
Xih , uniquely by solving a least squares problem. Then, it is straightforward to recover any 
remaining nonzero xih from the measurements. ■ 

Appendix B. Proofs of the results in Section 3.2 . 


B.l. Tools from probability theory. In this section we provide the definitions and results 
from probability theory that we frequently apply in the proofs. 

Definition 1 .For a convex and non-decreasing function ip : R + —^ R + that satisfies 
ip (0) = 0, the Orlicz ip-norm for a random matrix (or vector) X is defined as 


X\\^ = inf <J zz > 0 | E 




Some important special cases of the Orlicz ^-norms are 

• the Orlicz 2 -norm, also known as the sub-Gaussian norm, denoted by ||*||^ 2 with 

*2 

^2 (t) = e 2 — 1 , and 

• the Orlicz 1 -norm, also known as subexponential norm, denoted by ||-||^ 1 with xpi ( t ) = 
e 1 — 1 . 

Proposition 1 (Matrix Bernstein’s inequality [14, Proposition 2]). Let Xi, X 2 ,..., and X n 

be independent random matrices of dimension d\ x g ?2 that satisfy E [Xf\ — 0 . Suppose that 
for B > 0 we have 


max 
2=1, 2 ,...,22 



<b, 


and define 


a : = max - 


E 


T. x ‘ x ’ 


. 2=1 


E 




. 2=1 


Then, there exist a constant C such that for all t > 0 the tail bound 

' \fnB' 


V 


2=1 


< C max jcry/t + log (di + d 2 ), Slog (* + l°g (di + c? 2 ))| , 


holds with probability at least 1 — e ~ l . 

Proposition 2 (Orlicz norm of a finite maximum [25, Lemma 2.2.2]). Letty be a convex, non¬ 
decreasing, nonzero function that obeys ^ (0) = 0 and limsup^ ip ( x ) ip (y) ( cxy ) < 00 
for some constant c. Then, for any random variables x\, X 2 ,..., and x n we have 


max Xi 

2=1,2,...,22 


< tylp 


-1 


( n) max I 

2 = 1 , 2 ,... ,22 


Xi\ 
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for some constant c^ depending only on f). 

Proposition 3 (Hanson-Wright inequality [21, Theorem 1.1]). Let x G M n be a random vector 
with independent components X{ which satisfy E [xf\ — 0 and ||^z||^ 2 < p . Let A be an n x n 
matrix. Then, for every t > 0, 

P {| x T Ax — E [x t Ax] | > t] < 2e {p 4 II a IIf’p 2| I a H } 


B.2. Proof of Theorem 2. The auxiliary and intermediate lemmas needed for the proof 
of Theorem 2 are subsumed to the latter parts of this section. To prove the theorem we need 
to construct a dual certificate that exhibits certain properties on the “support set” 

(B.l) T = | hv* + ux * | u G C^, v G C L | , 


and its orthogonal complement T^. 

Proof of Theorem 2. Our proof begins by stating the conditions for unique recovery of 
hx * from (3.5) which parallels the arguments in [1, Section 3.1]. Without repeating every 
detail explained in [1], we provide a sketch here for clarification. Recall that h and x are 
assumed to have unit t^-norm. Let 

Vj :S ^ hh*S + Sxx* - hh*Sxx*, V T ± :S (l - titi*) S (I - xx*) 

denote the orthognoal projections onto the subspaces T and T^, respectively. It can be shown 
that (see, e.g. [19]) the matrix hx * is a unique minimizer of (3.5) if there exists a matrix 
Y G range (^l*) such that 


5ft (hx* - V T (Y) , Vj (Z)\ - 5ft (V T r (Y) , V T x (Z)> + \\V T ± (Z )|| # > 0, 


for all Z G null (^4). Applying Holder’s inequality to the first two terms shows that it suffices 
to find a Y G range (*4*) that satisfies 


(B.2) 


hx * - v T (y) ^ \\v T (z)\\ f + (i - \\v T r onii) ii^t-l (nil* > o, 


for all Z G null (A). Using the fact that Z G null (A) and Lemma 2 below which guarantees 


(B.3) 


MPOIll-E \\A(X)\\ 


< - \\x\ 


for all X G T, we can deduce that 


0 = \\A(Z)\\ F > -j= \\Vj(z)\\ F - Mil \\Vjx (Z) 


holds with probability at least 1 — 3L ^ if K > pi log 2 L loglog(A^ + l) for some /3 > 0. 
The bound (B.3) also guarantees that V T ± (Z) = Z — Vj (Z) cannot be the zero matrix. 
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Combining these results and (B.2) shows that it suffices to find a Y G range (*4*) (i.e., the 
dual certificate) that obeys 


(B.4) 

and 

(B.5) 


V2\\A\\ V T (Y)-hx* 


ii?v cnii<4- 


1 

< - 
F 4 


Similar to [ 1 ] we employ the golfing scheme [9] to construct the dual certificate Y. Consider 
a partition of the the index set { 1 , 2 ,..., K} to its disjoint subsets Ki, K 2 ,..., and K p such 
that 

K\ = p Vp€{l,2 

Define the operator restricted to indices in K p as 


(B. 6 ) 


ILP 


A P (X) := \j — (Fn Q X) # Kp - 


Furthermore, we obtain a sequence of matrices l^o = 0 , Y ±,..., Yp through the recursive 
relation 


Yp = Yp -1 + A* p A p (hx* - Vj (Yp- 1 ) 

Our goal is to show that Y = Yp satisfies (B.4) and (B.5) with high probability. With 
(B.7) W p :=Vj(Y p )-hx*, 

projecting both sides of the above recursion onto T yields 

w p = w p - 1 - t t a;a p (w p - 1) 

= (v T - v t a;a p Vj) (w p - 1 ), 

where the latter equation holds because W v -\ G T by construction. Therefore, with 

|K P | = p > n log 2 L log log (N + 1), 
we can invoke Lemma 2 below to guarantee that 

]|FF p || f < - ||FF p _i|| f \/p = 1,2,... P, 


and thus 
(B. 8 ) 


I W 


P\\F — 


< 2~ p 


hx* 


= 2~p Yp = l,2,...,P, 
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hold with probability at least 1 — 3 PL @. This result implies that (B.4) holds for Y — Yp if 

p > io g2 (aV2 ||a||) = ^ + io g2 nan. 


From Lemma 3 below we know that that ||*4.|| < 1 + + yj ^ Xo ^ L with probability at least 

1 — . Therefore, we can deduce that with 


(B.9) 


P > max jl, log ^1 + | , 


(B.4) holds for Y — Yp with probability exceeding 1 — (3 P + 1) L~^. 

To show that Y — Yp also obeys (B.5), we begin by expressing each Yp explicitly in 
terms of the matrices W p as 

p P 

Y P = J2 Y p- y p- i = - E a *p a p ( w p -i) • 

p= l p=i 

Then using the fact that V T ± (W p -i) = 0 for all p = 1, 2 ,..., P we can write 


II7V ( y p)\\ = 


P= 1 

v T i. [ J2 a *p a p (^p-i) - w p- i 

VP =1 


P=1 

p 

<Y,\\ a *p a p( w p-^-w p -i\\. 

p =1 


< 


If the size of each partition K p is sufficiently large and specifically obeys 


(B.10) L H K p | > M log 2 Blog log (iV + 1), 

then we can apply Lemma 4, stated below in Section B.2.3, to simplify the bound on \\Vj± (Yp) || 
and write 

p 

II7V (Xp)II<!E2-»<|, 

p= 1 


which holds with probability at least 1 — cPL & where c is an absolute constant. Therefore, 
if there exists a P that satisfies both (B.9) and (B.10), then (B.4) and (B.5) simultaneously 
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hold for Y — Yp with probability at least 1 — c'PL P for some absolute constant d. To 
guarantee existence of such P, it suffices to have 


p Le 

K>n log 2 L log — log log (N + l), 

fl 


for which we can choose P < log Since /a > L/N we have P < N. The probability of the 
desired events exceeds 1 — 0 ( NL ~^). ■ 

B.2.1. A p is a near isometry on T. We would like to show that the restriction of A p , 
defined by (B.6), to the subspace T in (B.l) has a near isometry behavior. In particular, our 
goal is to show that for all X G T the inequality 


MpWIIf-e ha, poll 


< - \\x\ 


holds with high probability. The following lemma with K = K p establishes the desired 


Lemma 2 (near isometry of Ak on T ).Let be K C {1,2,..., K} be an arbitrary 
and define 


property. 
index set 


(B.ll) 


A PO 



(FN © X) 


For any (5 > 0, if we have 


13 

|K| > n log 2 L log log (N + l), 


then 


Uk(X)\\ 2 f 


E 


IIA POII 



x 


ii 2 

IIf 


for all X € T, with probability at least 1 — 3 L @ 
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Proof. For every X — hv* + tzx* G T we have 


PkPOIIf-e \\Ak(x 


IK 


E tr (( F * © X) (Ml -1) (f n © x) 


ke K 


IK | 


E tr ( ( D h F » D *v + d u f n d x ) (<t> k 4>* k -1) 


ke K 


D v F* N D* n + D* x F* N D* u 


| K | 
L 


ke K 


E tr (D n F N D* v (<Ml - 1) d v f* n d ; 


+ M E tr (DuF n d x (Ml -1) d* x f* n d 


ke K 


+ M E tr ( D h F nD* v (Ml - I) D* x F* n D* u ) 


ke K 


(B.12) 


fceK 


+ M E tr ( d u f n d x (Ml -1) d v f* n d i 


Therefore, for all X G T we can write (B.12) as 

II A ( x )\\f - E [|Mk (X)|&] = T E (Zk,vv*) + T E i 2 '^*) + 4 E ' 

I I ke k I I /cgk I I fceK 

where the summands are expressed using the matrices 

( 

(B.13) Z k = L 


D%F* N D^F N D^ k 




h 

2 \ 


2 1 

L 

/ 


(B.14) 

and 


(B.15) 

Then, the triangle inequality yields 

\\Ak (X)|||,-e[Mk (X 


Z' k = Ldiag (F N D X (Ml - I) D*x F n) , 


Z'l = L( Di t F- N DlD FN(x3 ^ - i*h‘ 


< 


|K| 


E z * 

ke K 


+ 


u 


IK | 


E z i 

ke K 


+ 


2 lltxllo llvl 


|K| 


E z 

ke K 


Without loss of generality we can assume that v is orthogonal to x which implies that 
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and thereby 

(B.16) \\Ak (X)||| 


E 


\\M POII 


< 


|K| 


ke K 


+ 


ke K 


+ 


ke K 


\X\ 


F > 


holds for all IgT. Therefore, it suffices to bound the operator norm of the sums of Z&s, Z' k s, 
and Zj£s, separately. As shown by Lemmas 6 , 7, and 8 in Section B.3, the matrix Bernstein’s 
inequality can be used to establish the desired bounds. It follows from these lemmas that for 


P 


|K| > filog 2 L log log (N + 1), the bound 

Mk(X)|||-e[m k (X)||| 
holds for all X G T with probability at least 1 — 3 L~^. 


< l |pq|, 

- 2 II I IF 


B.2.2. Operator norm of A- The following lemma establishes a global bound for the 
operator norm of A that holds with high probability. 

Lemma 3 (the operator norm of A). For any f3 > 0, the operator norm of A can be bounded 
as 


~ \lF + 


/31ogL 


K 


with probability at least 1 — L 0 
Proof By definition we have 


sup 

x/o 


MOOIh 

lx 


sup 

x/o 


F 

jt ( F N 0 X) & 


X 


< \j — ||<&|| sup 
Xj^o 


I F 

F N 01) 


IXI 


1 


Vk 


|$| 


where the last inequality holds because ||.F/v©X|| f = —5^ ||X|| F . Therefore, we can use 
standard tail bounds for the spectral norm of random matrices with independent sub-Gaussian 
entries to bound ||<fr|| and thus ||^4||. For example, the bound established in [20, Proposition 
2.4] guarantees that for any t > 0 we have 

||$|| < cfVZ + VK^ +t 

with probability at least 1 — 2 e _ct2 , where C and c are absolute constants. Therefore, setting 


t — 


_ / (3 log L+log 2 


we can show that 


||$|| < FZ+ VK+ y/^logL 

holds with probability at least 1 — L~P. This completes the proof. 
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B.2.3. Decay of ||*4**4 P (W p -i) — FF p _i|| with respect to p. Our goal in this section is 
to show that for A p s defined by (B.6) and W p s defined by (B.7), with high probability, the 
quantity ||*4**4 P (W p -i) — W p _i|| decays quickly as p increases. 

Lemma 4 .For p = 0,1,..., P — 1 let W p be defined by (B.7). Then we can choose 

I K p | > n log 2 L log log (IV + 1), 


such that 


\\A* p Ap(Wp-,)-W p -,\\ 

holds for every p simultaneously with probability at least 1 — cPL~P where c > 0 is an absolute 
constant. 

Proof. The action of A p A p on the matrix W p -\ obeys 

a;a p (Wp-,) - w p -i = lf n © ((f n © w p -,) (^L<j> Kp <r Kp - /)) 

= Wl E LF * 0 © Wp- 1) (Ml -1)). 

1 P ' ke Kp 


Since W p -, € T we can find vectors u and v such that 

W p -1 = hv* + ux*, 


which implies that 


A* p Ap (Wp-,) - W p -, = ±-\2 LF NQ((F N Q (hv* + ux *)) (Ml -I)). 

' P ' k£K„ 


Without loss of generality, we also assume that x and v are orthogonal so that 


\W p -,\\ i F = 


hv* 


. || —* ||2 n ||2 . 2 

-f - 11 VLX 11 p — ||v H 2 — I - 11 ^ 11 2 • 


Therefore, on the event that the near isometry of A p on T as stated by the Lemma 2 holds, 
the bound in (B.8) guarantees that 


(B.17) 

Furthermore, we can write 


\u\ 


+ IMIo < 2 ~ 2p . 


IK, 


T Zk + T z k 


i k £Kp 


k£ Kp 


(B.18) 


A* p Ap (Wp-!) - W p -1 
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where 

:= LF N © ((Fjv © (ht>*)) - j)) 

= LD^D Fn ( vQ(j) k )F N D*^ k - hv* 

and 

Z fc := LF N 0 ((Fjv 0 (ux*)) {cj> k (t>i - I)) 

= LD U DF N (xQ4> k )F nD . 

These matrices are very similar to the matrix Z k defined by (B.15). In fact, if we consider 
Z k to be a function of x and h such as Z k (x,h^, then it is easy to verify that 

z fc = (z"(u,h)) , 

and 

Z k = (Z'Ux,u)) T . 

Therefore, we can readily use Lemma 8 in Section B.3 to obtain bounds for the spectral norms 
of the sum of Z k s and the sum of Z k s. To adapt the result of Lemma 8, it suffices to scale 
the deviation bounds by the norm of the vectors u or v and replace the coherence of h by 
that of u, as necessary. 

As explained above, we can use Lemma 8 and (B.18) to obtain 




K, 


keKp 


+ 




fee Kp 


< 


+ max 


JL 

K„ 


log L, 


^//IlogLlog (N + 1) log log (N + 1) 


IK, 


IL ||u|| 


IK, 


log A 


\[l ||u||^ log L log (N + 1) log log (N + 1) 


IK, 


with probability at least 1 — 2 L ;tl . We define a quantity that controls the largest 

row-wise energy of W p - 1, by 

i2 


(B.19) 


Up— 1 •— L 1 | 


00,2 ’ 


where IHI^ 2 denotes the largest row-wise ^-norm. Since u — W p -\x and x is unit-norm, it 
is straightforward to show that 



< 


Vp —l 
L ’ 
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and rewrite the bound on 
||^^p (W„_i) - W„_i | 


||a;a P (wvi)- w p _i|| as 

i n n / nr 7 v^log L log (AT + 1) log log (AT + 1) 

~ max IVjiy log L ’ - K\ - 

v / /yT log -Llog (N + 1) log log (IV + 1) | 
l^pl J 


The inequality (B.17) implies that ||ix|| 2 <2 P and ||v|| 2 <2 P . Furthermore, if we have 

I Kpl > fj, log 2 L log log (IV + 1) , 

then we can invoke Lemma 5 below to guarantee the bound fi v -\ < 2 ~ 2p p with probability 
exceeding 1 — 2 L~^. Therefore, the above deviation bound can be simplified to 


|| A* p Ap (W p -i) - W p _i 


P 

< 2 p max 




yfjji log L log (TV + 1) log log (N + 1) 

|K^I 


The events that the above inequality depends on for every p = 1, 2,..., P, hold simultaneously 
with probability exceeding 1 — cPL~P where c > 0 is an absolute constant. ■ 

B.2.4. Controlling the largest row-norm of W p . Through the following lemma we show 
that the largest row-wise t^-norm of W p decreases significantly as p increases. 

Lemma 5 .For p = 1, 2,..., P, let p p be defined as (B.19). Furthermore, suppose that 


Hp-I < 2 2{p 


Then, with \K p \ > plog 2 L log log (N + 1) we have 

T P < 2 _2p /i, 

with probability at least 1 — 2 L~^. Therefore, we have 

Tp < 2 -2p /L 


simultaneously for all p = 1, 2,..., P with probability at least 1 — 2 PL~^. 

Proof. Let R p := W p - 1 — A*A P (W p -i). Furthermore, denote the Z-th columns of ii*, 
W*_ l7 and W* by r^ p , w^ p - i, and w^ p , respectively. Because W p — Vj (R p ), it follows 
from Lemma 10 in Section B.3 that 




h 

T 

L 



2 


(B.20) 
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We can expand R p as 


R p = w~\ E w p- 1 ~ LF * © (( F n © w p - 1) ct> k <t>i) 




|K 


L Vi - lf n © ((Fn © Wp-!) 4 > k 4 >i) • 


fee K p 


Therefore, we can write 

^ = Tin E (™Ti - 0 fart (n © w;_i))) h 




= |T E - LD ^ F N ((F N © W p _i) 4>k © £) 

1 pl keKp 

= nT E A ((^0Wp-i)4 © *) 


ke K v 


= lib S zt - 


ke K v 


where 

(B.21) := W*p_Ji - LD^Tn ((WM4 © *) 

Conditioned on Ai, A 2 , ■ ■ ■, and Ap-i, we can invoke Lemma 9 below and show that 


R*h 


2 £ 2 —2( P —1) maY / h lo g L M lQ g 2 + !) lQ g 2 lQ g (N + !) lQ g 2 L 


max 


|Kr 


|Kr 


(B.22) 

z l r 'pi i>'pi 

holds with probability exceeding 1 — L~^. Furthermore, we can expand (x,r^ p ) 
{x, r l)P ) = T <*. w i,p~ 1) “ L (*> // © (/; © <l>ki w l,p- 1) 


as 




= jj^T E (*>“ L (/i©^fc) (<£fc> fi ©«h,P-i) 




|K„| E 


/cG Kp 


with 


Cfc := £ (fi © *, fl © «h, P -i) - (*, v>i,p- 1 ) • 


The Orlicz 1-norm of can be bounded using Lemma 12 below as 
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Furthermore, we have 


E 


E k*i 5 

/cE Kp 


= |K P | [l 2 E \{fiQx,(j) k )\ 2 0«J|, P -i)| : 


112 


\(x,wi iP -i 



where Lemma 11 in Section B.3 is used to obtain the inequality. Therefore, the scalar Bern¬ 
stein’s inequality guarantees that 




ke K p 


< 


max 


I II2 y3|Kpi (* + lo § 2 ). ]|^i,p—i II 2 lo S (* + lo g2) 


with probability at least 1 — e \ With t = /3\ogL + log TV, we deduce that 

2 


<x,r I ,„)| a = | 2 - 


k(z Kp 


(B.23) 


< ||w/ )P _i ||2 max 


{ log L log 2 L 


2 f ’ 


holds with probability exceeding 1 — N l L &. 

The inequalities (B.20), (B.22), and (B.23), and the assumption 


L 1 1| 2 ^ l^p— l ^ 


-2(P-1) 


show that 


l II^,pII 2~ 2 2(P 1} M m ax 

+ 2 -2 ( p-1 )^max 


fi log L fi log 2 [N + 1) log 2 log [N + 1) log 2 L 


IK, 


IK, 


f log L log 2 L 


holds with probability at least 1 — 2N 1 L Then using the assumption that \K p 
li log 2 L log log (N + 1) and applying the union bound over l guarantees that 

fi p <2 ~ 2p !i, 


with probability exceeding 1 —2L Finally, since /iq = i/1|o||^o 2 ~ ^ 

application of the above bound guarantees that 


h 


2 

= /i, a recursive 


fi p <2 2p fi , 


for p = 1, 2,... P hold simultaneously with probability exceeding 1 — 2 PL ■ 
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B.3. Supplementary lemmas. The proofs in this section rely on a form of the matrix 
Bernstein’s inequality borrowed from [14] and stated in Proposition 1. 

Lemma 6 .For any (3 > 0 the matrices Z k , defined by (B.13), obey 


ke K 


< 


max 


|\/|K| /xlogL, n log m log i} , 


with probability at least 1 — L @. 

2 


Proof. Since \\Z k \\ < L 


h 


— fi , then 


max||Z t || *<B:= 




log 2 


Furthermore, we have 


E [Z|] = L 2 



h 

4 

h 

4 \ 



4 I - 


2 T 


L 


L 2 


7 


from which we obtain the bound 


E 


Ike K 


< cr 2 := | K| /a. 


Therefore, for any t > 0 the matrix Bernstein’s inequality guarantees that 


ke K 


< C max 1 cry/t + log2L, B log ( ^ ^ J (t + log 2 L) 


a 


= C max |\/|K| /i (t + log2L), log (* + lo g 2L ) 

with probability at least 1 — e~ l . In particular, with t — (3logL we obtain 


ke K 


< 


max 


| y|K|/ilog L, /I log n log l| 


with probability at least 1 — L 

Lemma 7. For any (3 > 0 t/ie matrices Z' k , as defined by (B.lf), obey 


ke K 


/3 

< max 


{v/|K|logL,log 2 L}, 


with probability at least 1 — L @. 
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Proof. Note that Z' k = L 


(- D |F JV ( :E 0^ fe )| 2 



using which we deduce that 


Z' h \\ < L max 

11 1=1,2,...,N 


\(4>k, x © fi )| 2 



Therefore, the Orlicz 1-norm of the right-hand side is an upper bound for that of Z' k . Using 
Proposition 2, we can thus show that 


\Z'u 


©i 


< L log (N + 1) max 
^ Z=l,2,...,iV 


\{4> kl xQ fi)Y 


X 


L 


for some absolute constant c^ 1 > 0 depending only on the function ( u ) 

Lemma 12 below to the latter inequality then yields 



1. Applying 


z 'k\\^ 1 < 8c Va 1°S + !) 


and thereby 


< B -= 8c bi i°g (N + 1). 


Furthermore, we have 


E [Z'l\ = L 2 E 


D 


\F N ( x &ct > k )\ 4 


X 


L 2 


= L 2 \ D 


||*|| 2 T 

EtlFjv^©^)! 4 ] L 2 1 




L 2 L 2 


^ 21 


where the first matrix inequality follows from Lemma 11 below. The above inequality then 
yields 


E 


E z 

Ike K 


< a 2 := 2 IKI 


Applying the matrix Bernstein’s inequality for t > 0 shows that that 


E z '* 

ke K 


< C' max < a 


v / t + log2AT, B log ^ (t + log 2 N) 


C max | a/2 |K| Jt + log 21V), 8c^, 1 log (IV + 1) log 


8c^ log (N + 1) 

V2 


(f+ log2iV) 
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with probability at least 1 — e \ Setting t — /3logL we obtain 


E z i 

ke K 


max 


{ V| K | log L, log L log (N + 1) log log (IV + 1)} 


with probability at least 1 — L & M 

Lemma 8. For any (3 > 0 the matrices Z'^, as defined by ( B.15 ), obey 


ke K 


3 

< max 


| Vm |K| logL, y 7 //log L log (N + 1) log log (IV + 1) j , 


with probability at least 1 — L @. 

Proof. Using the triangle inequality we have 




< L 
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Straightforward bounds for the spectral norm yield 


d%f* n did Fn{x Q<Pk) 
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\Fn (x 0 4>k)\\oo ) 


using which we can write 


< L 
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'k\ |-0i - 

= FF 


;= max^ | (0 fc , * 0 / z )| 


1 


l= Yf* N \{<t>k, x ® fi)\ 


V>i + lo § 2 

i 

+ lo S 2 ‘ 


The Orlicz 1-norm on the right-hand side can be bounded using Proposition 2. Therefore, we 
have 




V’l 


< log (IV + 1) i= max ^ || | {<p k , x © /,) 11|^ 


for some absolute constant c^ 1 > 0 that only depends on the function fii (u) = e u — 1. Lemma 
13 below guarantees that 




Thus, for each k we have 
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or equivalently 
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Furthermore, we have 


E [Zl*Z'l] = L 2 E 
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| hQF jv (x&<t>k) | 2 


— hh 
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which implies that 
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E z "* z 
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< |K|L h = |K| fi 


We also have 
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F* n D 


<t>k r ^• L '|/i0F Ar (xO0 fc )| 2 - FAr - D ^ 


which results in 
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U=i 


— XX 


Eh ©*)l 2 ©«>l 2 
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where we used Lemma 11 below in the second inequality. Therefore, we obtain 


max 
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E z " z 

.fceK 
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E z " z 

jfeeK 


< cr 2 := max {/q 3} | K| . 


Applying the matrix Bernstein’s inequality for t > 0 shows that 

E z 


ke K 


< C" max < ay/t + log (IV + L), S log f ^ ^ ) (t + log (IV + L)) 
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= C" max |a/|K| (t + log JN + L)) max {p, 3}, 

(8c*. + 1) log ( ( ^‘ + l)l°g ( ^ + 1) | (t + log(w + L)) 


log 2 \ log 2 

with probability at least 1 — e~ l . Setting t = /31ogL we obtain 
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with probability at least 1 — . ■ 

Lemma 9 .Let z & and n P -\ be defined as in (B.21) and ( B.19 ), respectively. Furthermore, 
as in Lemma 5, suppose that [i p -\ < 2 -2p+2 /i. Then, conditioned on A±,A 2 ,..., and A p -i, 
for (3 > 0 we have 


z * 

/cG Kp 


P 

< 2 -P+! 


max 


^/|K P | ji log L, y^log (iV + 1) log log (N + 1) log L 


with probability at least 1 — N l L 

Proof. Applying the triangle inequality to (B.21) yields 


(B.24) 




< B : = 


w^h 
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D(t> k F N ((Fn © W p - 1 ) (f k © h.) 


•01 


The first term on the right-hand side of (B.24) can be bounded as 


W*! h 
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w;_ i h 


01 


log 2 
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\W* , I 

| r r P— I | 




log 2 


< 


2~p+i 
log 2 ’ 


where the last inequality follows from the bound ||LF p _i|| f < 2~ p+1 that is established using 
Lemma 2. Let to/ p _i denote the Ltli column of TL*_ 1 . The second term on the right hand 
side of (B.24) can also be bounded as 


D^ k F* N ((Fyv © W p _ i) <f k © h) 


01 


= L 


< L 
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= v^log (N + 1) max ||toj )P _i|| 2 = yjn v -1 log (iV + 1) 
<V^log(iV + l)2- p+1 , 


where the first through sixth lines respectively hold because of the fact that D ( f )k F N a 

||a|| 2 for any a E C N , an elementary property of the spectral norm, Proposition 2, Lemma 
13, the definition of fi p - 1 , and the assumed bound on fi p -\. Therefore, we deduce that B in 
(B.24) obeys 


B < v^log (N + 1) 2~ p+1 . 

To apply the matrix Bernstein’s inequality we also need to upperbound the spectral norm of 
the expectation of the sum of the terms z p Zk, and the sum of the terms z^z\. To bound the 
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first spectral norm we can write 
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Furthermore, the second spectral norm can be bounded as 
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where the first inequality follows from the Cauchy-Schwarz inequality and the fact that 
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h W v -\u 


> o, the second inequality follows from the Holder’s inequality applied to the 


sum inside the expectation, and the third inequality follows from Lemma 11 below. Therefore, 
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we have 


a := max 


E 


J 2 Z *k Z k 

keKp 
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E ZkZ k 

/cE Kp 


< |K P | /i2 _2p+2 . 


Then for any t > 0 the matrix Bernstein’s inequality yields 

J 2 Zk 


ke K„ 


< C max | o\ft + log (L + 1), B log ( — ^ ) (t + log (L + 1)) 
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< 2 P+1 max j y/|K p |/i(i + log (L + 1)), 

^log (N + 1) log log (AT + 1) (t + log (L + 1))} . 
with probability at least 1 — e~ l . Setting t = /31ogL + log N we obtain 

z k <2 _p+1 max | ^/|K P |/rlogL, y^llog (IV + 1) log log (IV + 1) log L 

ke Kp 2 ^ 

with probability exceeding 1 — N _1 L~^ M 

Lemma 10. For any matrix Z G C NxL , the l-th row of Q = Vj (Z) denoted by q*[ obeys 


\Qi\\l< 


h 


Z*h 


+ \(*,zi)\ 2 , 


where zi is the l-th column of Z*. 

Proof. Since Q is the projection of Z onto T we can write 

Q = hh Z (.I - xx*) + Zxx*. 

Therefore, q^ (i.e., the Ath row of Q ) can be written as 

q] = hih*Z (I - xx*) + z* t xx*, 

which implies that 

Il9ill2 = \^\\(I-™*)Z*h\\\\(x, Zl ) 
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Z*h 


+ lfo*i>l 2 


Lemma 11. Tor any pair of vectors a and b, and a Rademacher vector (j) we have 
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In particular, for a — b we have 

E |(a, <f>)\ 4 < 3 ||a ||2 — 2 ||a 


Proof By expanding | (a, <f>) | 2 \ (b , </>)| 2 we obtain 
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2j]|a i | 2 |fo i 
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which is the desired bound. ■ 

Lemma 12 .Let a and b be arbitrary complex L-vectors, and 4> € {±1} L be a Rademacher 
vector with independent entries. Then the random variable (a, </>) (< p , b) — (a, b) is subexpo¬ 
nential and its Orlicz 1-norm obeys 

II («><£) (0, b > - M b )\M ^ IMI 2 ll & ll 2 - 

Specifically, for a = b we have 


\M<t>)\ 2 



"01 
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Proof. We begin by proving similar bounds for real vectors a. 
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where the inequality follows from a variant of the Hanson-Wright inequality stated in Propo¬ 
sition 3. The latter integral becomes smaller than one, by choosing u = C ||a|| 2 || 6|| 2 for a 
sufficiently large absolute constant C. Therefore, we can deduce that 
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While the above result can be readily used for the special case of a = 6, we provide a dif¬ 
ferent proof for this case that does not rely on the Hanson-Wright inequality. The Hoeffding’s 
inequality guarantees that 


P (| (a, 0) | > t) < 2e 2||a|| 2 , 
holds for all t > 0. Therefore, for any u > 0 we have 

= i+ / 
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In particular, for u = 6 ||a|| 2 we obtain 
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which implies 
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< 6 ||a|| 2 . Therefore, using triangle inequality we can deduce that 
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To obtain similar inequalities for complex values of a and b we can simply decompose the 
vectors into their real and imaginary part and apply the triangle inequality. Therefore, we 
obtain 


|| (a, <j>) (<j>, b) - (a, b) ||^ < || <Ka, </>) (<j>, »6> - (5Ra, K6) ||^ 

+ ||<Ka,0) (<f>,Zb)- <Ka,3&>||^ 

+ ||(9a,0) (0,»6)-<9fo,»6)||^ 

+ II 0) (0. ^ b ) ~ $*>) 110! 

<C(||Ka|| 2 + ||9a|| 2 )(||K6|| 2 + ||96|| 2 ) 
— 2C* II Cl||2 H&H2 i 


and 
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|( a ,(/>)| 2 - ||a 


< 




|(5fta, 4>)\ 2 — ||3?a 


bi 
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1(9x1, <^))| 2 — 119a 


bi 


< 


8 ^||$ia|| 2 + ||9a|| 2 ^ = 8 ||a|| 2 , 


which completes the proof.H 

Lemma 13. For a Rademacher vector <fi with iid entries and any given vector a, we have 


a,</>)||L < 8 ||o|| 2 . 


Proof. We first treat the case of real vector a and then obtain the general case from the 
real case. Using the Hoeffding’s inequality we can write 
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' l(q,0H 
e u 
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e u > e ) e dt 


roc 
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/ P (|(a, <fi)\ > tu ) e*d t 

J o 


roc + _ t z u■ 

<1 + 2 e 

Jo 


2 II a II2 dt 

||| /*°° if tu _ \M 2 \ 2 

1 + 2e 2 n 2 / e 2 v Il a ll 2 u ) dt 
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r - 11 a 11 o Wi 

< 1 + 2v 27r-—-e 2 u 2 . 


In particular, at u — 4^/2 ||a|| 2 we have 

l( a >0)l 


E 


e 4 x/2||cx||2 


Jtt j_ 

< 1 + ^“ e64 < 2 > 


which implies that |||(a, </))|||^ i < 4\/2||a|| 2 . 

To obtain the complex version of the inequalities we can simply apply the latter inequality 
to the real and imaginary parts of a. Then, we can write 


III («,</>) I IU< |||(«a,0)|||^+ |||(9a,0)|||^ 

< V2(||3fia|| 2 + ||3a|| 2 ) 

< 8||o|| 2 , 

where the first inequality is the triangle inequality, the second inequality follows from the real 
version shown above, and the third inequality is a simple application of the Cauchy-Schwarz 
inequality. ■ 
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